Information processing device

ABSTRACT

In order to enable derivation of useful subsets X1, X2, . . . , XT also for an online submodular optimization problem for which a fixed strategy is not effective, an information processing apparatus (1) includes: an objective function setting unit (11) that sets, as an objective function ft in each round t∈[T], a submodular function on a power set 2S of a set S consisting of n elements; and a subset sequence derivation unit (12) that derives a subset sequence X1, X2, . . . , XT∈2S in which an expected value of regret Σt∈[T]ft(Xt)−Σt∈[T]ft(Xt*) with respect to any benchmark X1*, X2*, . . . , Xt*∈2S satisfying Σt∈[T−1]dH(Xt*, Xt+1*) is not more than an upper limit Max (n,T,V).

TECHNICAL FIELD

The present invention relates to an information processing apparatusthat solves an online submodular optimization problem.

BACKGROUND ART

Use of online submodular optimization is being considered in order todetermine advertisements to be presented to a user regarding webadvertising and to determine a product to be sold at a discount in websales. Online submodular optimization refers to selecting a subset of agiven set in each round in order to minimize or maximize a cumulativevalue of an objective function.

Examples of a known document related to online submodular minimizationinclude Non-patent Literature 1. Patent Literature 1 discloses analgorithm for deriving subsets X₁, X₂, . . . , X_(T) that minimize anexpected value of regretΣ_(t∈[T])f_(t)(X_(t))−min_(X∈S){Σ_(t∈[T])f_(t)(X)} to not more thanO((nT)^(1/2)). Note here that f_(t) represents an objective function ina round t.

CITATION LIST Non-Patent Literature

[Non-patent Literature 1]

E. Hazan and S. Kale, “Online Submodular Minimization”, Journal ofMachine Learning Research 13 (2012) 2903-2922

SUMMARY OF INVENTION Technical Problem

In a method disclosed in Non-patent Literature 1, subsets X₁, X₂, . . ., X_(T) are derived in which an expected value of regretΣ_(t∈[T])f_(t)(X_(t))−min_(X∈S){Σ_(t∈[T])f_(t)(X)} is minimized to notmore than O((nT)^(1/2)). This causes the following problem.Specifically, useful subsets X₁, X₂, . . . , X_(T) can be derived for anonline submodular minimization problem for which a fixed strategy toselect the same subset in all rounds is effective, whereas usefulsubsets X₁, X₂, . . . , X_(T) cannot be derived for online submodularminimization problem for which a fixed strategy is not effective. Anonline submodular maximization problem also has a similar problem.

An example aspect of the present invention has been made in view of theabove problem, and an example object thereof is to provide aninformation processing apparatus that makes it possible to derive usefulsubsets X₁, X₂, . . . , X_(T) also for an online submodular optimizationproblem for which a fixed strategy is not effective.

Solution to Problem

An information processing apparatus in accordance with an aspect of thepresent invention includes: an objective function setting means thatsets, as an objective function f t in each round t∈[T] (T is any naturalnumber), a submodular function on a power set 2^(S) of a set Sconsisting of n elements (n is any natural number); and a subsetsequence derivation means that derives a subset sequence X₁, X₂, . . . ,X_(T)∈2^(S) in which an expected value of regretΣ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with respect to anybenchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfyingΣ_(t∈[T−1])d_(H)(X_(t)*, X_(t+1)*)≤V is not more than an upper limit Max(n,T,V) determined from n,T,V, assuming that V is a given integer notless than 0,

where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between subsets, theHamming distance being defined byd_(H)(X_(t)*,X_(t+1)*)=|X_(t)∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

An information processing apparatus in accordance with an aspect of thepresent invention includes: an objective function setting means thatsets, as an objective function f_(t) in each round t∈[T] (T is anynatural number), a normalized submodular function on a power set 2^(S)of a set S consisting of n elements (n is any natural number); and asubset sequence derivation means that derives a subset sequence X₁, X₂,. . . , X_(T) satisfying the following condition β1 or β2:

the condition β1 being that each subset X_(t) satisfies |X_(t)|≤kassuming that k is a given natural number and that an asymptoticbehavior of an expected value of α regretαΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to anybenchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying |X_(t)*|≤k andΣ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with an asymptoticbehavior of a function A (k,T,V) determined from k,T,V, assuming that Vis a given integer not less than 0,

the condition β2 being that the asymptotic behavior of the expectedvalue of the α regret αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) withrespect to the any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfyingΣ_(t∈[T−1])d_(H)(X_(t)*,X_(t−1)*) coincides with an asymptotic behaviorof a function B (n,T,V) determined from n,T,V, assuming that V is agiven integer not less than 0,

where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between subsets, theHamming distance being defined byd_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Advantageous Effects of Invention

An example aspect of the present invention makes it possible to providean information processing apparatus that makes it possible to deriveuseful subsets X₁, X₂, . . . , X_(T) also for an online submodularoptimization problem for which a fixed strategy is not effective.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating a configuration of an informationprocessing apparatus in accordance with a first example embodiment.

FIG. 2 is a flow diagram showing a flow of an information processingmethod in accordance with the first example embodiment.

FIG. 3 is a flow diagram showing a first specific example of a subsetsequence derivation process included in the information processingmethod shown in FIG. 2 .

FIG. 4 is a flow diagram showing the first specific example of thesubset sequence derivation process included in the informationprocessing method shown in FIG. 2 .

FIG. 5 is a block diagram illustrating a configuration of an informationprocessing apparatus in accordance with a second example embodiment.

FIG. 6 is a flow diagram showing a flow of an information processingmethod in accordance with the second example embodiment.

FIG. 7 is a flow diagram showing a first specific example of a subsetsequence derivation process included in the information processingmethod shown in FIG. 6 .

FIG. 8 is a flow diagram showing the first specific example of thesubset sequence derivation process included in the informationprocessing method shown in FIG. 6 .

FIG. 9 is a block diagram illustrating a configuration of a computerfunctioning as the information processing apparatus in accordance withthe first example embodiment or the second example embodiment.

BRIEF DESCRIPTION OF DRAWINGS First Example Embodiment

A first example embodiment of the present invention will be described indetail with reference to the drawings.

Online Submodular Minimization Problem

Considered are (i) a set S consisting of n elements and (ii) anobjective function f_(t): 2^(S)→I defined for each round t∈[T]. Notehere that n and T each represent any natural number. [T] represents aset of natural numbers not less than 1 and not more than T. 2^(S)represents a power set of the set S, that is, a set consisting of allsubsets of the set S. I represents a closed interval on a real number R.In the first example embodiment, it is assumed that I=[−1,1]. Thisassumption may affect expressions and values to be described below, butthe present invention is not limited to the first example embodiment,and is therefore not limited by the assumption.

It is assumed that each objective function f_(t) is a submodularfunction. That is, it is assumed that an inequalityf_(t)(X∪{i})−f_(t)(X)≥ft(Y∪{i})−f_(t)(Y) is satisfied for (i) any subsetX,Y∈2^(S) satisfying X□Y and (ii) any element i∈S.

Among problems of selecting a subset sequence X₁, X₂, . . . ,X_(T)∈2^(S), a problem whose target is minimization of a cumulativevalue Σ_(t∈T)f_(t)(X_(t)) of the objective function f_(t) is referred toas an “online submodular minimization problem”. In the first exampleembodiment, the online submodular minimization problem is studied underthe following full-information setting or bandit feedback setting.

Full-information setting: After selecting a subset X_(t) in a round t,it is possible to refer to a value f_(t)(X) of the objective functionf_(t) with respect to any subset X∈2^(S).

Bandit feedback setting: After selecting the subset X_(t) in the roundt, it is (1) possible to refer to a value f_(t)(X_(t)) of the objectivefunction f_(t) with respect to the selected subset X_(t) and (2)impossible to refer to a value f_(t)(X) of the objective function f_(t)with respect to a subset X∈2^(S) that is different from the selectedsubset.

Configuration of Information Processing Apparatus

A configuration of an information processing apparatus 1 in accordancewith the first example embodiment will be described with reference toFIG. 1 . FIG. 1 is a block diagram illustrating a configuration of theinformation processing apparatus 1.

The information processing apparatus 1 is an apparatus for solving theonline submodular minimization problem related to the set S consistingof the n elements. As illustrated in FIG. 1 , the information processingapparatus 1 includes an objective function setting unit 11 and a subsetsequence derivation unit 12.

The objective function setting unit 11 is a means that sets, as theobjective function f_(t) in each round t, a submodular function on thepower set 2^(S) of the set S. The objective function setting unit 11 isan example of an “objective function setting means” in the claims. Thesubmodular function that the objective function setting unit 11 sets asthe objective function f_(t) may be (i) predetermined, (ii) input by auser via a keyboard or the like, or (iii) input by another apparatus viaa communication network or the like. The submodular function that theobjective function setting unit 11 sets as the objective function f_(t)may be generated in various processes carried out inside the informationprocessing apparatus 1.

The subset sequence derivation unit 12 is a means that derives a subsetsequence X₁, X₂, . . . , X_(T) satisfying a condition α below. Thesubset sequence derivation unit 12 is an example of a “subset sequencederivation means” in the claims. The subset sequence X₁, X₂, . . . ,X_(T) that is derived by the subset sequence derivation unit 12 may beprovided to a user via a display or the like, or may be provided toanother apparatus via a communication network or the like. The subsetsequence X₁, X₂, . . . , X_(T) that is derived by the subset sequencederivation unit 12 may be used in various processes carried out insidethe information processing apparatus 1.

The condition α is that an expected value of regretΣ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with respect to anybenchmark X₁*,X₂*, . . . ,X_(t)*∈2^(S) satisfyingΣ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*) is not more than an upper limit Max(n,T,V) determined from n,T,V, assuming that V is a given integer notless than 0, where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance betweensubsets, the Hamming distance being defined byd_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Flow of Information Processing Method

A flow of an information processing method S1 in accordance with thefirst example embodiment will be described with reference to FIG. 2 .FIG. 2 is a flow diagram showing the flow of the information processingmethod S1.

The information processing method S1 is a method for solving the onlinesubmodular minimization problem related to the set S consisting of the nelements. As illustrated in FIG. 2 , the information processing methodS1 includes an objective function setting process S11 and a subsetsequence derivation process S12. The information processing method S1 iscarried out by, for example, the information processing apparatus 1.

The objective function setting process S1 is a process for setting, asthe objective function f_(t) in the each round t, the submodularfunction on the power set 2^(S) of the set S. The objective functionsetting process S11 is carried out by, for example, the objectivefunction setting unit 11 of the information processing apparatus 1. Thesubset sequence derivation process S12 is a process for deriving thesubset sequence X₁, X₂, . . . , X_(T) satisfying the condition α shownin the previous section. The subset sequence derivation process S12 iscarried out by, for example, the subset sequence derivation unit 12 ofthe information processing apparatus 1.

Effect of Information Processing Apparatus and Information ProcessingMethod

In the method disclosed in Non-patent Literature 1, subsets X₁, X₂, . .. , X_(T) that cause an expected value of regretΣ_(t∈[T])f_(t)(X_(t))−min_(X∈S){Σ_(t∈[T])f_(t)(X)} to be not more thanan upper limit Max (n,T) determined in accordance with n,T. Thus, usefulsubsets X₁, X₂, . . . , X_(T) can be derived for the online submodularminimization problem for which a fixed strategy to select the samesubset in all rounds is effective, whereas the useful subsets X₁, X₂, .. . , X_(T) cannot be derived for the online submodular minimizationproblem for which a fixed strategy is not effective.

In contrast, in the information processing apparatus 1 and theinformation processing method S1 in accordance with the first exampleembodiment, the subsets X₁, X₂, . . . , X_(T) are derived in which theexpected value of the regretΣ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) is not more than the upperlimit Max (n,T,V) determined from n,T,V. In this case, a benchmark X₁*,X₂*, . . . , X_(t)* need only satisfy Σ_(t∈[T−])d_(H)(X_(t)*,X_(t+1)*)≤Vand need not be constant. It is therefore possible to derive the usefulsubsets X₁, X₂, . . . , X_(T) also for the online submodularminimization problem for which the fixed strategy is not effective.

First Specific Example of Subset Sequence Derivation Process

The inventors of the present invention have succeeded in proving,regarding the online submodular minimization problem in full-informationsetting, the following theorem A.

Theorem A: If a subset sequence X₁, X₂, . . . , X_(T)∈2^([n]) is asubset sequence derived by an algorithm shown in Table 1 below, thefollowing inequality (1) holds true for the any benchmark X₁*, X₂*, . .. , X_(t)*∈2^([n]). This causes an asymptotic behavior of the expectedvalue of the regret Σ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) tocoincide with an asymptotic behavior of{T(n+Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*))}^(1/2). Note that the asymptoticbehaviors are compared here in disregard of a polynomial of logT and apolynomial of logn.

$\begin{matrix}{{E\left\lbrack {{\sum\limits_{t = 1}^{T}{f_{t}\left( X_{t} \right)}} - {\sum\limits_{t = 1}^{T}{f_{t}\left( X_{t}^{*} \right)}}} \right\rbrack} \leq {{4\sqrt{T\left( {n + {2{\sum\limits_{t = 1}^{T - 1}{d_{H}\left( {X_{i}^{*},X_{i + 1}^{*}} \right)}}}} \right)}} + \sqrt{32T{\log\left( {\left\lceil {\log T} \right\rceil + 4} \right)}}}} & (1)\end{matrix}$

where E[·] represents an expected value for internal randomness of thealgorithm. Furthermore, ┌⋅┐ represents the smallest natural number notless than ⋅.

TABLE 1 Algorithm 1 An algorithm for online submodular minimization withfull-information Require: The number T of rounds and the size n of theground set.  1: Set ⁢ d = [ log ⁢ T ] + 4 ⁢ and ⁢ let ⁢ p 1 = 1 d ⁢ 1 ∈ Δ d .Set ⁢ η = log ⁢ d ? . For ⁢ each ⁢ j ∈ [ d ] , initialize ⁢ ⁢ x t ( j ) ⁢ by xt ( j ) = 0 ∈ a .   2: for t = 1, 2, . . . , T do  3:  Set x_(i) =Σ_(j=1) ^(d) p_(tj)x_(i) ^((j)).  4:  Pick u_(i) from a uniformdistribution over [0, 1] and output X_(i) = H_(ui) (x_(i)) = {i ∈ [n] |x_(u) ≥ u_(t)}.  5:  Get feedback of f_(t) and compute g_(i), asubgradient of f_(t) at x_(i).  6:  for j = 1, 2, . . . , d do  7:   ${{{Compute}x_{t + 1}^{(j)}{as}x_{t + 1}^{(j)}} \in {\underset{\text{?}}{\arg\min}{{x - y_{i + 1}^{(j)}}}_{2}^{2}}},{{{where}y_{t + 1}^{(j)}} = {{x_{t}^{(j)} - {\eta^{(j)}g_{i}{with}\eta^{(j)}}} = {\sqrt{\frac{\eta}{\text{?}}}.}}}$ 8:  end for  9:  ${{Compute}p_{i + 1}{as}p_{t + 1}} = {{\frac{w_{i}}{{w_{t}}t}{with}w_{ij}} = {\exp\left( {{- \eta}{\sum_{t - 1}^{t}{g_{r}^{\gamma}x_{\gamma}^{(j)}}}} \right){\left( {j \in \lbrack d\rbrack} \right).}}}$10: end for ?indicates text missing or illegible when filed

The following description will discuss, with reference to FIG. 3 , aspecific example of the subset sequence derivation process S12 whichspecific example is obtained by embodying the above theorem. Note thatthe following description identifies the set S consisting of the nelements with a set [n] of natural numbers={1,2, . . . , n}. Sinceelements of the set S and elements of the set [n] are in one-to-onecorrespondence, generality is not lost by such identification. The abovetheorem merely provides an example of the first example embodiment. Thefirst example embodiment should not be construed as being limited to thetheorem.

FIG. 3 is a flow diagram showing a flow of the subset sequencederivation process S12 in accordance with a specific example of thepresent invention. As shown in FIG. 3 , the subset sequence derivationprocess S12 includes an initial setting step S121, a subset derivationstep S122, a subgradient derivation step S123, and a vector update stepS124. The subset derivation step S122, the subgradient derivation stepS123, and the vector update step S124 are carried out for the each roundt∈[T]. That is, these steps are repeatedly carried out T times.

In the subset sequence derivation process S12 in accordance with aspecific example of the present invention, a natural number d, a realnumber η, and d real numbers η⁽¹⁾,η⁽²⁾, . . . , η^((d)) are used asconstants. Furthermore, a d-dimensional vector p_(t)∈[0,1]^(d)satisfying ∥p_(t)∥=1 an n-dimensional vector x_(t)∈R^(n), and dn-dimensional vectors x_(t) ⁽¹⁾, x_(t) ⁽²⁾, . . . , x_(t) ^((d))∈R^(n)are used as variables. Moreover, T real numbers u₁, u₂, . . . , u_(T)are used as respective random variables that are uniformly distributedon an interval [0,1].

The initial setting step S121 is a step of setting the constants d,η⁽¹⁾,η⁽²⁾, . . . , and η^((d)) and initializing the vectors p_(t), x_(t)⁽¹⁾,x_(t) ⁽²⁾, . . . , and x_(t) ^((d)). In the initial setting stepS121, the subset sequence derivation unit 12 sets the constant d to, forexample, a number obtained by adding 4 to the smallest natural numbernot less than logT. The subset sequence derivation unit 12 sets theconstant η to, for example, η={logd/(8T)}^(1/2). For each j∈[d], thesubset sequence derivation unit 12 sets a constant η^((j)) to, forexample, η^((j))=(n/2^(j))^(1/2). The subset sequence derivation unit 12initializes the vector p_(t) to, for example, p₁=(1/d,1/d, . . . , 1/d).For the each j∈[d], the subset sequence derivation unit 12 initializes avector x_(t) ^((j)) to, for example, x_(t) ^((j))=(0,0, . . . , 0).

The subset derivation step S122 is a step of deriving the subset X_(t).In the subset derivation step S122, the subset sequence derivation unit12 sets the vector x_(t) first to, for example,x_(t)=Σ_(j∈[d])p_(tj)x_(t) ^((j)). Note here that p_(tj) represents ajth component of the vector p_(t). Next, the subset sequence derivationunit 12 randomly sets a value of a random variable u_(t). Subsequently,the subset sequence derivation unit 12 derives the subset X_(t) definedby X_(t)={i∈[n]|x_(ti)≥u_(t)}. Note here that x_(ti) represents an ithcomponent of the vector x_(t).

The subgradient derivation step S123 is a step of deriving a subgradientg_(t) at x_(t) of the objective function f_(t). In the subgradientderivation step S123, it is possible to refer to the value f_(t)(X) ofthe objective function f_(t) with respect to any subset X∈[n]. In thesubgradient derivation step S123, the subset sequence derivation unit 12derives, for example, the subgradient g_(t) defined by the followingexpression (2). In the following expression (2), o represents apermutation on the set [n] satisfying x_(tσ(1))≥x_(tσ(2))≥. . .≥x_(tσ(n)). S_(σ)(i) represents a subset of the set [n] which subset isdefined by S_(σ)(i)={σ(j)|j∈[i]}. x(i)∈{0,1}^(n) represents an indicatorvector in which the ith component is 1 and a component that is differentfrom the ith component is 0.

$\begin{matrix}{{g_{t}(\sigma)} = {{\sum\limits_{i = 1}^{n - 1}{{f_{t}\left( {S_{\sigma}(i)} \right)}\left( {{\chi\left( {\sigma(i)} \right)} - {\chi\left( {\sigma\left( {i + 1} \right)} \right)}} \right)}} + {{f_{t}\left( \lbrack n\rbrack \right)}{\chi\left( {\sigma(n)} \right)}}}} & (2)\end{matrix}$

The vector update step S124 is a step of updating the vectors p_(t) andX_(t) ⁽¹⁾,x_(t) ⁽²⁾, . . . , x_(t) ^((d)). In the vector update stepS124, the subset sequence derivation unit 12 updates the vector x_(t)(j)in accordance with, for example, the following expression (3). Thesubset sequence derivation unit 12 updates the vector p_(t) inaccordance with, for example, the following expression (4).

$\begin{matrix}{{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}g_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & (3)\end{matrix}$ $\begin{matrix}{{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{g_{\tau}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & (4)\end{matrix}$

As is clear from the theorem A, use of the subset sequence derivationprocess S12 in accordance with a specific example of the presentinvention enables the expected value of the regretΣ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with respect to the anybenchmark X₁*, X₂*, . . . , X_(t)* satisfyingΣ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V to be not more than the upper limitMax (n,T,V) defined by the following expression (5):

Max(n,T,V)=4√{square root over (T(n+2V))}+√{square root over(32Tlog(┌log T┐+4))}  (5)

Second Specific Example of Subset Sequence Derivation Process

The inventors of the present invention have succeeded in proving,regarding the online submodular minimization problem in bandit feedbacksetting, the following theorem B.

Theorem B: If the subset sequence X₁, X₂, . . . , X_(T)∈2^([n]) is asubset sequence derived by an algorithm shown in Table 2 below, thefollowing inequality (6) holds true for the any benchmark X₁*, X₂*, . .. , X_(t)*∈2^([n]). This causes the asymptotic behavior of the expectedvalue of the regret Σ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) tocoincide with an asymptotic behavior ofnT^(2/3){(loglogT/n)^(1/2)+(1+Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)/n)}.

$\begin{matrix}{{E\left\lbrack {{\sum\limits_{t = 1}^{T}{f_{t}\left( X_{t} \right)}} - {\sum\limits_{t = 1}^{T}{f_{t}\left( X_{t}^{*} \right)}}} \right\rbrack} \leq {{\gamma T} + {4\left( {n + 1} \right)\sqrt{\frac{T}{\gamma}}\left( {{2\sqrt{\log\log T}} + \sqrt{n + {\sum\limits_{t = 1}^{T - 1}{d_{H}\left( {X_{t}^{*},X_{t + 1}^{*}} \right)}}}} \right)}}} & (6)\end{matrix}$

where γ represents a predetermined constant not less than 0 and not morethan 1, the predetermined constant being called a search parameter.

TABLE 2 Algorithm 2 An algorithm for online submodular minimization withbandit feedback Require: The number T of rounds, the size n of the baseset, and the exploration parameter γ ∈ [0, 1].  1: Set ⁢ d = 4 [ log ⁢ T ]⁢and ⁢ let ⁢ p 1 = 1 d ⁢ 1 ∈ Δ d . Set ⁢ η = log ⁢ d ? . For ⁢ each ⁢ j ∈ [ d ], initialize ⁢ ⁢ x t ( j ) ⁢ by x t ( j ) = 0 ∈ a .   2: for t = 1, 2, . .. , T do  3:  Set x_(t) = Σ_(j=1) ^(d) p_(tj)x_(t) ^((j)).  4:  $\begin{matrix}{{Output}X_{t}{given}{by}} \\{X_{i} = \left\{ {\begin{matrix}{H_{u_{t}}\left( x_{t} \right)} & \left( {\left. u_{t} \right.\sim{{Unif}\left( \left\lbrack {0,1} \right\rbrack \right)}} \right) & {{with}{probability}\left( {1 - \gamma} \right)} \\{S_{\sigma}\left( s_{t} \right)} & \left( {\left. s_{i} \right.\sim{{Unif}\left( \left\{ {0,1,\ldots,n} \right\} \right)}} \right) & {{with}{probability}\gamma}\end{matrix}.} \right.}\end{matrix}$  where S_(σ)(i) = {σ(j) | j ∈ [i]}.  5:  Observef_(t)(X_(t))  6:   $\begin{matrix}{{Compute}{\hat{q}}_{t}{given}{by}} \\{{\hat{q}}_{t} = {\frac{1}{q\text{?}}{f_{t}\left( X_{t} \right)}{\left( {{\chi\left( {\sigma\left( i_{t} \right)} \right)} - {\chi\left( {\sigma\left( {i_{t} + 1} \right)} \right)}} \right).}}} \\{where} \\{{q\text{?}} = {{\gamma \cdot \frac{1}{n + 1}} + {\left( {1 - \gamma} \right) \cdot \left( {{x\text{?}} - x_{t,{\sigma({i + 1})}}} \right)}}}\end{matrix}$  7:  for j = 1, 2, . . . , d do  8:   ${{{Compute}x_{t + 1}^{(j)}{as}x_{t + 1}^{(j)}} \in {\underset{\text{?}}{\arg\min}{{x - y_{i + 1}^{(j)}}}_{2}^{2}}},{{{where}y_{t + 1}^{(j)}} = {x_{t}^{(j)} = {{\eta^{(j)}{\overset{.}{g}}_{i}{with}\eta^{(j)}} = {\sqrt{\frac{\eta}{\text{?}}}.}}}}$ 9:  end for 10:  ${{Compute}p_{i + 1}{as}p_{i + 1}} = {{\frac{x_{t}}{\text{?}}{with}\omega_{tj}} = {{\exp\left( {{- \eta}{\sum_{tu1}^{t}{g\text{?}x_{t}^{(j)}}}} \right)}{\left( {j \in \lbrack d\rbrack} \right).}}}$11: end for ?indicates text missing or illegible when filed

The following description will discuss, with reference to FIG. 4 , aspecific example of the subset sequence derivation process S12 whichspecific example is obtained by embodying the above theorem. Note thatthe following description identifies the set S consisting of the nelements with the set [n] of the natural numbers={1,2, . . . , n}. Sincethe elements of the set S and the elements of the set [n] are inone-to-one correspondence, generality is not lost by suchidentification. The above theorem merely provides an example of thefirst example embodiment. The first example embodiment should not beconstrued as being limited to the theorem.

FIG. 4 is a flow diagram showing a flow of the subset sequencederivation process S12 in accordance with a specific example of thepresent invention. As shown in FIG. 3 , the subset sequence derivationprocess S12 includes an initial setting step S125, a subset derivationstep S126, a subgradient derivation step S127, and a vector update stepS128. The subset derivation step S126, the unbiased estimator derivationstep S127, and the vector update step S128 are carried out for the eachround t∈[T]. That is, these steps are repeatedly carried out T times.

In the subset sequence derivation process S12 in accordance with aspecific example of the present invention, the natural number d, thereal number η, and the d real numbers η⁽¹⁾, η⁽²⁾, . . . , η^((d)) areused as constants. Furthermore, the d-dimensional vector p_(t)∈[0,1]^(d)satisfying ∥p_(t)∥=1, the n-dimensional vector x_(t)∈R^(n), and the dn-dimensional vectors x_(t) ⁽¹⁾, x_(t) ⁽²⁾, . . . , x_(t) ^((d))∈R^(n)are used as variables. Moreover, the T real numbers u₁, u₂, . . . ,u_(T) are used as respective random variables that are uniformlydistributed on the interval [0,1]. Further, T integers s₁, s₂, . . . ,s_(T) are used as respective random variables that are uniformlydistributed on {0,1, . . . , n}.

The initial setting step S125 is a step of setting the constants d,η⁽¹⁾,η⁽²⁾, . . . , and η^((d)) and initializing the vectors p_(t), x_(t)⁽¹⁾,x_(t) ⁽²⁾, . . . , and x_(t) ^((d)). In the initial setting stepS125, the subset sequence derivation unit 12 sets the constant d to, forexample, a number obtained by quadrupling the smallest natural numbernot less than logT. The subset sequence derivation unit 12 sets theconstant η to, for example, η=[logd/{2(n+1)²T}]^(1/2). For the eachj∈[d], the subset sequence derivation unit 12 sets the constant η^((j))to, for example, η^((j))=(n/2^(j))^(1/2). The subset sequence derivationunit 12 initializes the vector p_(t) to, for example, p₁=(1/d,1/d, . . .,1/d). For the each j∈[d], the subset sequence derivation unit 12initializes the vector x_(t) _((j)) to, for example, x_(t) ^((j))=(0,0,. . . , 0).

The subset derivation step S126 is a step of deriving the subset X_(t).In the subset derivation step S126, the subset sequence derivation unit12 sets the vector x_(t) first to, for example,x_(t)=Σ_(j∈[d])p_(tj)x_(t) ^((j)). Note here that p_(tj) represents thejth component of the vector p_(t). Next, the subset sequence derivationunit 12 randomly sets respective values of random variables u_(t),s_(t).Subsequently, the subset sequence derivation unit 12 derives the subsetX_(t) defined by (1) X_(t)={i∈[n]|x_(ti)≥u_(t)} or derives the subsetX_(t) defined by X_(t)={σ(j)|j∈[s_(t)]}. Note here that x_(ti)represents the ith component of the vector x_(t). Note also that σrepresents the permutation on the set [n] satisfyingx_(tσ(1))≥x_(tσ(2))≥. . . ≥x_(tσ(n)). A probability with which a subsetX_(t)={i∈[n]|x_(ti)≥u_(t)} is derived in the subset derivation step S126is set to 1−γ. In other words, a probability with whichX_(t)={σ(j)|j∈[s_(t)]} is derived in the subset derivation step S126 isset to γ.

The unbiased estimator derivation step S127 is a step of deriving anunbiased estimator {circumflex over ( )}g_(t) (with a symbol ∧ aboveg_(t)) of the subgradient g_(t) at x_(t) of the objective functionf_(t). In the unbiased estimator derivation step S127, it is possible torefer to only the value f_(t)(X_(t)) of the objective function f_(t)with respect to a subset X_(t)∈[n] derived in the subset derivation stepS126. In the unbiased estimator derivation step S127, the subsetsequence derivation unit 12 derives, for example, the unbiased estimator{circumflex over ( )}g_(t) defined by the following expression (7). Inthe following expression (7), σ represents the permutation on the set[n] satisfying x_(tσ(1))≥x_(tσ(2))≥. . . ≥x_(tσ(n)). q_(t) represents avector in which the ith component q_(ti) is defined byq_(ti)=γ/(1+n)+(1−γ)(x_(tσ(i))−x_(tσ(i+1))). i_(t) represents a naturalnumber satisfying X_(t)=S_(σ)(i_(t)).

$\begin{matrix}{{\hat{g}}_{t} = {\frac{1}{q_{{ti}_{t}}}{f_{t}\left( X_{t} \right)}\left( {{\chi\left( {\sigma\left( i_{t} \right)} \right)} - {\chi\left( {\sigma\left( {i_{t} + 1} \right)} \right)}} \right)}} & (7)\end{matrix}$

The vector update step S128 is a step of updating the vectors p_(t) andx_(t) ⁽¹⁾,x_(t) ⁽²⁾, . . . ,x_(t) ^((d)). In the vector update stepS128, the subset sequence derivation unit 12 updates the vector x_(t)(j)in accordance with, for example, the following expression (8). Thesubset sequence derivation unit 12 updates the vector p_(t) inaccordance with, for example, the following expression (9).

$\begin{matrix}{{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}{\hat{g}}_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & (8)\end{matrix}$ $\begin{matrix}{{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{{\hat{g}}_{t}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & (9)\end{matrix}$

As is clear from the theorem B, use of the subset sequence derivationprocess S12 in accordance with a specific example of the presentinvention enables the expected value of the regretΣ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with respect to the anybenchmark X₁*, X₂*, . . . , X_(t)* satisfyingΣ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V to be not more than the upper limitMax (n,T,V) defined by the following expression (10):

$\begin{matrix}{{{Max}\left( {n,T,V} \right)} = {{\gamma T} + {4\left( {n + 1} \right)\sqrt{\frac{T}{\gamma}}\left( {{2\sqrt{\log\log T}} + \sqrt{n + V}} \right)}}} & (10)\end{matrix}$

Second Example Embodiment

A second example embodiment of the present invention will be describedin detail with reference to the drawings.

Online Submodular Maximization Problem

Considered are (i) a set S consisting of n elements and (ii) anobjective function f_(t):2^(S)→R_(≥0) defined for each round t∈[T]. Notehere that n and T each represent any natural number. [T] represents aset of natural numbers not less than 1 and not more than T. 2^(S)represents a power set of a set S, that is, a set consisting of allsubsets of the set S. R_(≥0) represents nonnegative real numbers as awhole. It is assumed that each objective function f_(t) is a normalizedsubmodular function.

Among problems of selecting a subset sequence X₁, X₂, . . . ,X_(T)∈2^(S), a problem whose target is maximization of a cumulativevalue Σ_(t∈T)f_(t)(X_(t)) of the objective function f_(t) is referred toas an “online submodular maximization problem”. In the second exampleembodiment, the online submodular maximization problem is studied underfull-information setting (described earlier).

Configuration of Information Processing Apparatus

A configuration of an information processing apparatus 2 in accordancewith the second example embodiment will be described with reference toFIG. 5 . FIG. 5 is a block diagram illustrating a configuration of theinformation processing apparatus 2.

The information processing apparatus 2 is an apparatus for solving theonline submodular maximization problem related to the set S consistingof the n elements. As illustrated in FIG. 5 , the information processingapparatus 2 includes an objective function setting unit 21 and a subsetsequence derivation unit 22.

The objective function setting unit 21 is a means that sets, as theobjective function f_(t) in the each round t, a submodular function onthe power set 2^(S) of the set S. The objective function setting unit 21is an example of the “objective function setting means” in the claims.The submodular function that the objective function setting unit 21 setsas the objective function f_(t) may be (i) predetermined, (ii) input bya user via a keyboard or the like, or (iii) input by another apparatusvia a communication network or the like. The submodular function thatthe objective function setting unit 21 sets as the objective functionf_(t) may be generated in various processes carried out inside theinformation processing apparatus 2.

The subset sequence derivation unit 22 is a means that derives a subsetsequence X₁, X₂, . . . , X_(T) satisfying a condition β1 or β2 below.The subset sequence derivation unit 22 is an example of the “subsetsequence derivation means” in the claims. The subset sequence X₁, X₂, .. . , X_(T) that is derived by the subset sequence derivation unit 22may be provided to a user via a display or the like, or may be providedto another apparatus via a communication network or the like. The subsetsequence X₁,X₂, . . . , X_(T) that is derived by the subset sequencederivation unit 22 may be used in various processes carried out insidethe information processing apparatus 2.

The condition β1 is that each subset X_(t) satisfies |X_(t)|≤k assumingthat k is a given natural number and that an asymptotic behavior of anexpected value of α regret αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t))with respect to any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying|X_(t)*|≤k and Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with anasymptotic behavior of a function A (k,T,V) determined from k,T,V,assuming that V is a given integer not less than 0.

The condition β2 is that the asymptotic behavior of the expected valueof the α regret αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) withrespect to the any benchmark X₁*,X₂*, . . . ,X_(t)*∈2^(S) satisfyingΣ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with an asymptoticbehavior of a function B (n,T,V) determined from n,T,V, assuming that Vis a given integer not less than 0.

Information Processing Method

A flow of an information processing method S2 in accordance with thesecond example embodiment will be described with reference to FIG. 6 .FIG. 6 is a flow diagram showing the flow of the information processingmethod S2.

The information processing method S2 is a method for solving the onlinesubmodular maximization problem related to the set S consisting of the nelements. As illustrated in FIG. 6 , the information processing methodS2 includes an objective function setting process S21 and a subsetsequence derivation process S22. The information processing method S2 iscarried out by, for example, the information processing apparatus 2.

The objective function setting process S21 is a process for setting, asthe objective function f_(t) in the each round t, the submodularfunction on the power set 2^(S) of the set S. The objective functionsetting process S21 is carried out by, for example, the objectivefunction setting unit 21 of the information processing apparatus 2. Thesubset sequence derivation process S22 is a process for deriving thesubset sequence X₁, X₂, . . . , X_(T) satisfying the condition β1 or β2shown in the previous section. The subset sequence derivation processS22 is carried out by, for example, the subset sequence derivation unit22 of the information processing apparatus 2.

Effect of Information Processing Apparatus and Information ProcessingMethod

In the information processing apparatus 2 and the information processingmethod S2 in accordance with the second example embodiment, subsets X₁,X₂, . . . , X_(T) in which the expected value of the α regretαΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) is not more than an upperlimit Max (k,T,V) or an upper limit Max (n,T,V) is derived. In thiscase, the benchmark X₁*, X₂*, . . . , X_(t)* need not be constant. It istherefore possible to derive useful subsets X₁, X₂, . . . , X_(T) alsofor online submodular maximization for which a fixed strategy is noteffective.

First Specific Example of Subset Sequence Derivation Process

The inventors of the present invention have succeeded in proving,regarding the online submodular maximization problem in full-informationsetting in which the number of elements of the subset X_(t) is fixed,the following theorem C.

Theorem C: If each objective function f_(t) has monotonicity and asubset sequence X₁, X₂, . . . , X_(T)∈2^([n]) constituted by a subsetX_(t) consisting of k or less elements is a subset sequence derived byalgorithms shown in Tables 3 and 4 below, the following evaluationformula (11) holds true for any benchmark X₁*, X₂*, . . . ,X_(t)*∈2^([n]) constituted by a subset X_(t)* consisting of k or lesselements. Note here that the objective function f_(t) havingmonotonicity means that f_(t)(X)≤f_(t)(Y) holds true for any subsetX,Y∈∈2^([n]) satisfying X□Y. Note also that O of Landau with a tildeabove represents an asymptotic behavior in disregard of a polynomial oflogT and a polynomial of logn.

$\begin{matrix}{{E\left\lbrack {{\left( {1 - \frac{1}{e}} \right){\sum\limits_{t = 1}^{T}{f_{t}\left( X_{t}^{*} \right)}}} - {\sum\limits_{t = 1}^{T}{f_{t}\left( X_{t} \right)}}} \right\rbrack} = {\overset{\sim}{O}\left( \sqrt{{kT}\left( {k + {\sum\limits_{t = 1}^{T - 1}{d_{H}\left( {X_{t}^{*},X_{t + 1}^{*}} \right)}}} \right)} \right)}} & (11)\end{matrix}$

TABLE 3 Algorithm 3 Algorithm for  

  submodular maxi 

 tion under size constraint Require: The number  

  the base set  

  the size-constrained parameter  

  such that  

   

   1: Initialize  

  copies  

  (Algorithm 4) with parameters T and  

   2: for  

  do  3:  S 

   4:  for  

  do  5:   Get the  

 output  

   6:   Draw an item  

   7:    

   8:  end for  9:  Output  

  10:  for  

  1 

 2 

  do 11:   Set  

  for each  

  12:    

  13:  end for 14: end for

indicates data missing or illegible when filed

TABLE 4 Algorithm 4 FSF* Require: The number T of rounds and the numbern of actions.  1: $\begin{matrix}{{{\left\lbrack {{Initialization}{step}} \right\rbrack{Set}J} = \left\lceil {\log T} \right\rceil},{\eta = \sqrt{\frac{\log d}{T}}},{{{and}{initialize}w_{t}} = {{\left( {w_{t1},w_{t2},\ldots,w_{td}} \right)^{T}{by}w_{1j}} = 1}}} \\{{{{for}j} = 1},2,\ldots,{{{J.{For}}j} = 1},2,\ldots,J,{{{set}\left( {\eta^{(j)},a^{(j)}} \right)} = {{{and}{initialize}w_{t}^{(j)}} = {\left( {w_{t1}^{(j)},w_{t2}^{(j)},\ldots,w_{tn}^{(j)}} \right)^{T}{by}}}}} \\{{w_{1i}^{(j)} = {{1{for}i} = 1}},2,\ldots,{\eta.}}\end{matrix}$  2: for t = 1, . . . T do  3:  ${{{Set}q_{i}} = {{\frac{w_{i}}{w_{i}}{and}p_{i}^{(j)}} = {{\frac{w\text{?}}{w_{i}^{(j)}}{for}j} = 1}}},2,\ldots,{J.}$ 4:  [t~th output] Compute p_(t) = Σ_(j=1)j q_(i) ^(j)p_(t) ^((j)) andoutput p_(t).  5:  [t~th input] Get feedback of

_(t) = (

_(t1),

_(t2), . . . ,

_(tn))^(T).  6:  for j = 1, 2, . . . , J do  7:   Compute v_(ti) ^((j))= w_(ti) ^((j)) exp(η^((j))

_(ti)) for i = 1, 2, . . . , n.  8:   ${{{Update}w_{i}^{(j)}{by}w_{i + {1d}}^{(j)}} = {{{\alpha^{(j)}\frac{w\text{?}}{n}} + {\left( {1 - \alpha^{(j)}} \right)v_{ti}^{(j)}{for}i}} = 1}},2,\ldots,{{n{where}W_{i}^{(j)}} = {v_{tl}^{(j)} + \ldots + {v_{tn}^{(j)}.}}}$ 9:   Update w_(ij) by w_(i+1,j) = w_(ij) exp(η

_(t) ^(T)p_(t) ^((j))). 10:  end for 11: end for?indicates text missing or illegible when filed

Note that the algorithm shown in Table 4 includes J fixed shareforecaster (FSF) algorithms corresponding to different η^((j)). Sinceeach of the FSF algorithms is a publicly-known algorithm, a descriptionthereof is omitted here. In the following description, {i_(t1), i_(t2),. . . , i_(ts)} is referred to as X_(ts), and {i_(t1), i_(t2), . . . ,i_(tk)} is referred to as X_(t). Furthermore, X_(ts)∪{i_(t,s+1)} isreferred to X_(t,s+1), and f_(t)(X_(ts)∪{i})−f_(t)(X_(ts)) is referredto as l_(ti).

The following description will discuss, with reference to FIG. 7 , aspecific example of the subset sequence derivation process S22 whichspecific example is obtained by embodying the above theorem. Note thatthe following description identifies the set S consisting of the nelements with a set [n] of natural numbers={1,2, . . . , n}. Sinceelements of the set S and elements of the set [n] are in one-to-onecorrespondence, generality is not lost by such identification. The abovetheorem merely provides an example of the second example embodiment. Thesecond example embodiment should not be construed as being limited tothe theorem.

FIG. 7 is a flow diagram showing a flow of the subset sequencederivation process S22 in accordance with a specific example of thepresent invention. As shown in FIG. 7 , the subset sequence derivationprocess S22 includes an FSF algorithm initialization step S221, a subsetderivation step S222, and a feed generation step S223. The subsetderivation step S222 and the feed generation step S223 are carried outfor the each round t∈[T]. That is, these steps are repeatedly carriedout T times.

The FSF algorithm initialization step S221 is a step of initializing, inaccordance with the number T of rounds, k FSF algorithm executionmodules FSF*⁽¹⁾, FSF*⁽²⁾, . . . , FSF*^((k)) that execute the FSFalgorithms.

The subset derivation step S222 is a step of deriving the subset X_(t).In the subset derivation step S222, after setting X_(t0) to X_(t0)=Ø,the subset sequence derivation unit 22 repeatedly carries out thefollowing process for s=1,2, . . . , k. First, the subset sequencederivation unit 22 reads a vector p_(t) ^((s)) that is output by an FSFalgorithm execution module FSF*^((s)). Next, the subset sequencederivation unit 22 derives an element i_(ts) from the read vector p_(t)^((s)). Subsequently, the subset sequence derivation unit 22 uses thederived element i_(ts) to generate X_(ts)=X_(t,s−1)∪{i_(ts)}. The subsetsequence derivation unit 22 derives a subset X_(t)=X_(tk) by repeatedlycarrying out the above process for s=1,2, . . . , k.

The feed generation step S223 is a step of generating feeds l_(t) ⁽¹⁾,l_(t) ⁽²⁾, . . . , l_(t) ^((k)) to be input to the respective FSFalgorithm execution modules FSF*⁽¹⁾, FSF*⁽²⁾, . . . , FSF*^((k)). In thefeed generation step S223, the subset sequence derivation unit 22generates, in accordance with l_(ti)^((s))=f_(t)(X_(t,s−1)∪{i})−f_(t)(X_(t,s−1))(i∈[n]), a feed l_(t)^((s))=(l_(t1) ^((s)),l_(t2) ^((s)), . . . ,l_(tn) ^((s))) to be inputto the FSF algorithm execution module FSF^(*(s)).

As is clear from the theorem C, use of the subset sequence derivationprocess S22 in accordance with a specific example of the presentinvention enables an asymptotic behavior of an expected value of (1−1/e)regret (1−1/e)Σ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respectto the any benchmark X₁*,X₂*, . . . , X_(t)*∈2^([n]) satisfyingΣ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V and constituted by the subset X_(t)*consisting of the k or less elements to coincide with the asymptoticbehavior of the function A (k,T,V) represented by the followingexpression (12):

A(k,T,V)=√{square root over (kT(k+V))}  (12)

Second Specific Example of Subset Sequence Derivation Process

The inventors of the present invention have succeeded in proving,regarding the online submodular maximization problem in full-informationsetting in which the number of elements of the subset X_(t) is notfixed, the following theorem D.

Theorem D: If a subset sequence X₁, X₂, . . . , X_(T)∈2^([n]) is asubset sequence derived by an algorithm shown in Table 5 below, thefollowing evaluation formula (13) holds true for the any benchmark X₁*,X₂*, . . . , X_(t)*∈2^([n].)

$\begin{matrix}{{E\left\lbrack {{\frac{1}{2}{\sum\limits_{t = 1}^{T}{f_{t}\left( X_{t}^{*} \right)}}} - {\sum\limits_{t = 1}^{T}{f_{t}\left( X_{t} \right)}}} \right\rbrack} = {\overset{\sim}{O}\left( {n\sqrt{T\left( {1 + {\frac{1}{n}{\sum\limits_{t = 1}^{T - 1}{d_{H}\left( {X_{t}^{*},X_{t + 1}^{*}} \right)}}}} \right)}} \right)}} & (13)\end{matrix}$

TABLE 5 Algorithm 5 Algorithm for  

  Submodular Max 

  Require: The number  

  of rounds, the size  

  of the  

   1: Initialize  

  copies  

  (Algorithm 4) with parameters  

  and 2.  2: for  

  = 1 

  do  3:  Set  

   4:  for s =  

  do  5:   Get the  

  output  

  from  

  and set  

   6:   With probability  

  set  

  Otherwise, (with probability    

   7:  end for  8:  Output  

  and get feedback of  

   9:  for  

  do 10:   Set  

  11:   Set  

  12:    

  as the  

  input 

  13:  end for 14: end for

indicates data missing or illegible when filed

The following description will discuss, with reference to FIG. 8 , aspecific example of the subset sequence derivation process S22 whichspecific example is obtained by embodying the above theorem. Note thatthe following description identifies the set S consisting of the nelements with the set [n] of the natural numbers={1,2, . . . , n}. Sincethe elements of the set S and the elements of the set [n] are inone-to-one correspondence, generality is not lost by suchidentification. The above theorem merely provides an example of thesecond example embodiment. The second example embodiment should not beconstrued as being limited to the theorem.

FIG. 8 is a flow diagram showing a flow of the subset sequencederivation process S22 in accordance with a specific example of thepresent invention. As shown in FIG. 7 , the subset sequence derivationprocess S22 includes an FSF algorithm initialization step S224, a subsetderivation step S225, and a feed generation step S226. The subsetderivation step S225 and the feed generation step S226 are carried outfor the each round t∈[T]. That is, these steps are repeatedly carriedout T times.

The FSF algorithm initialization step S224 is a step of initializing, inaccordance with the number T of rounds, n FSF algorithm executionmodules FSF*⁽¹⁾, FSF*⁽²⁾, . . . , FSF*^((n)) that execute the FSFalgorithms.

The subset derivation step S225 is a step of deriving the subset X_(t).In the subset derivation step S225, after setting X_(t0) to X_(t0)=Ø andsetting Y_(t0) to Y_(t0)=[n], the subset sequence derivation unit 22repeatedly carries out the following process for s=1,2, . . . , n.First, the subset sequence derivation unit 22 reads the vector p_(t)^((s)) that is output by the FSF algorithm execution module FSF*^((s))and sets q_(t) ^((s)) to q_(t) ^((s))=(1+2p_(t1) ^((s)))/4. Next, with aprobability q_(t) ^((s)), the subset sequence derivation unit 22 setsX_(ts) to X_(ts)=X_(t,s−1)∪{s} and sets Y_(ts) to Y_(ts)=Y_(t,s−1).Otherwise, the subset sequence derivation unit 22 sets X_(ts) toX_(ts)=X_(t,s−1) and sets Y_(ts) to Y_(ts)=Y_(t,s−1)\{s}. The subsetsequence derivation unit 22 derives the subset X_(t)=X_(tn)=Y_(tn) byrepeatedly carrying out the above process for s=1,2, . . . ,k.

The feed generation step S226 is a step of generating the feeds l_(t)⁽¹⁾,l_(t) ⁽²⁾, . . . ,l_(t) ^((k)) to be input to the respective FSFalgorithm execution modules FSF*⁽¹⁾,FSF*⁽²⁾, . . . ,FSF*^((k)). In thefeed generation step S226, the subset sequence derivation unit 22 setsα_(ts) to α_(ts)=f_(t)(X_(t,s−1)∪{s})−f_(t)(X_(t,s−1)) and sets β_(ts)to α_(ts)=f_(t)(Y_(t,s−1)\{s})−f_(t)(Y_(t,s−1)). The subset sequencederivation unit 22 generates, in accordance with l_(ti) ^((s))=(1−q_(t)^((s)))α_(ts) and l_(t2) ^((s))=q_(t) ^((s))β_(ts), a feed l_(t)^((s))=(l_(t1) ^((s)),l_(t2) ^((s))) to be input to the FSF algorithmexecution module FSF*^((s)).

As is clear from the theorem D, use of the subset sequence derivationprocess S22 in accordance with a specific example of the presentinvention enables an asymptotic behavior of an expected value of (½)regret (½)Σ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect tothe any benchmark X₁*, X₂*, . . . , X_(t)*∈2^([n]) satisfying_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V to coincide with an asymptoticbehavior of a function B (k,T,V) represented by the following expression(14):

B(n,T,V)=√{square root over (T(1+V/n))}  (14)

Software Implementation Example

Some or all of functions of the information processing apparatus 1 or 2can be realized by hardware provided in an integrated circuit (IC chip)or the like or can be alternatively realized by software. In the lattercase, the functions of the units of the information processing apparatus1 or 2 are realized by, for example, a computer that executesinstructions of a program that is software.

FIG. 9 illustrates an example of such a computer (hereinafter referredto as a “computer C”). As illustrated in FIG. 9 , the computer Cincludes at least one processor C1 and at least one memory C2. The atleast one memory C2 stores a program P for causing the computer C tooperate as the information processing apparatus 1 or 2. In the computerC, the at least one processor C1 reads and executes the program P storedin the at least one memory C2, so that the functions of the units of theinformation processing apparatus 1 or 2 are realized.

Examples of the at least one processor C1 encompass a central processingunit (CPU), a graphic processing unit (GPU), a digital signal processor(DSP), a micro processing unit (MPU), a floating point number processingunit (FPU), a physics processing unit (PPU), a microcontroller, and acombination thereof. Examples of the at least one memory C2 encompass aflash memory, a hard disk drive (HDD), a solid state drive (SSD), and acombination thereof.

Note that the computer C may further include a random access memory(RAM) in which the program P is to be loaded while being executed and inwhich various kinds of data are to be temporarily stored. The computer Cmay further include a communication interface through which data is tobe transmitted and received between the computer C and at least oneother apparatus. The computer C may further include an input/outputinterface through which (i) an input apparatus(s) such as a keyboardand/or a mouse and/or (ii) an output apparatus(s) such as a displayand/or a printer is/are to be connected to the computer C.

The program P can be recorded in a non-transitory, tangible storagemedium M capable of being read by the computer C. Examples of such astorage medium M encompass a tape, a disk, a card, a semiconductormemory, and a programmable logic circuit. The computer C can acquire theprogram P via the storage medium M. The program P can alternatively betransmitted via a transmission medium. Examples of such a transmissionmedium encompass a communication network and a broadcast wave. Thecomputer C can alternatively acquire the program P via the transmissionmedium.

Application Example

The information processing apparatus 1 or 2 described earlier isapplicable to various problems. An example of this is shown below.

Retail

It is assumed that a measure is to reduce the respective beer prices ofcompanies in a certain store. For example, in a case where animplemented measure X_(t)=[0,2,1, . . . ], it is assumed that a firstelement indicates setting of a beer price of a company A to a fixedprice, a second element indicates a 10% increase in a beer price of acompany B from a fixed price, and a third element indicates a 10%reduction in a beer price of a company C from a fixed price.

The objective function f_(t) regards the implemented measure X_(t) as aninput and regards, as an output, a result obtained by applying theimplemented measure X to the respective beer prices of the companies tocarry out sales. In this case, application of the above-describedoptimization method makes it possible to derive optimum setting of therespective beer prices of the companies in the above store.

Investment Portfolio

The following description will discuss a case of application to aninvestment activity of, for example, an investor. In this case, it isassumed that the implemented measure X_(t) is investment (purchase,capital increase) with respect to a plurality of financial products(stock brands, etc.) held or to be held by the investor, or selling orholding of the plurality of financial products. For example, in a casewhere the implemented measure X_(t)=[1,0,2, . . . ], it is assumed thatthe first element indicates additional investment in stocks of a companyA, the second element indicates holding (neither purchasing nor selling)receivables of a company B, and the third element indicates sellingstocks of a company C. The objective function f_(t) regards theimplemented measure X_(t) as the input and regards, as the output, aresult obtained by applying the implemented measure X_(t) to theinvestment activity with respect to financial products of the companies.

In this case, application of the above-described optimization methodmakes it possible to derive an optimum investment activity of theinvestor with respect to each brand.

Clinical Trial

The following description will discuss a case of application to anadministration activity for a clinical trial of a certain drug of apharmaceutical company. In this case, it is assumed that the implementedmeasure X_(t) is a dose of administration or avoidance ofadministration. For example, in a case where the implemented measureX_(t)=[1,0,2, . . . ], it is assumed that the first element indicatesthat administration in a dose 1 is carried out with respect to a subjectA, the second element indicates that administration is not carried outwith respect to a subject B, and the third element indicates thatadministration in a dose 2 is carried out with respect to a subject C.The objective function f_(t) regards the implemented measure X_(t) asthe input and regards, as the output, a result obtained by applying theimplemented measure X_(t) to the administration activity with respect toeach of the subjects.

In this case, application of the above-described optimization methodmakes it possible to derive an optimum administration activity withrespect to each of the subjects in the clinical trial of thepharmaceutical company.

Web Marketing

The following description will discuss a case of application to anadvertising activity (marketing measure) in an operating company of acertain electronic commerce site. In this case, it is assumed that theimplemented measure X_(t) is advertising (an online (banner)advertisement, advertising by electronic mail, direct mail, electronicmail transmission of a discount coupon, etc.), with respect to aplurality of customers, for a product or service to be sold by theoperating company. For example, in a case where the implemented measureX_(t)=[1,0,2, . . . ], it is assumed that the first element indicates abanner advertisement with respect to a customer A, the second elementindicates that advertising is not carried out with respect to a customerB, and the third element indicates electronic mail transmission of adiscount coupon to a customer C. The objective function f_(t) regardsthe implemented measure X_(t) as the input and regards, as the output, aresult obtained by applying the implemented measure X_(t) to theadvertising activity with respect to each of the customers. Note here aresult of implementation may be whether or not a banner advertisementhas been clicked, a purchase amount, a purchase probability, or anexpected value of the purchase amount.

In this case, application of the optimization method of the secondexample embodiment makes it possible to derive an optimum advertisingactivity of the operating company with respect to each of the customers.

Additional Remark 1

The present invention is not limited to the foregoing exampleembodiments, but may be altered in various ways by a skilled personwithin the scope of the claims. For example, the present invention alsoencompasses, in its technical scope, any example embodiment derived byappropriately combining technical means disclosed in the foregoingexample embodiments.

Additional Remark 2

The whole or part of the example embodiments disclosed above can bedescribed as, but not limited to, the following supplementary notes.

Supplementary Note 1

An information processing apparatus including:

-   -   an objective function setting means that sets, as an objective        function f_(t) in each round t∈[T] (T is any natural number), a        submodular function on a power set 2^(S) of a set S consisting        of n elements (n is any natural number); and    -   a subset sequence derivation means that derives a subset        sequence X₁, X₂, . . . , X_(T)∈2^(S) in which an expected value        of regret Σ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with        respect to any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S)        satisfying Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V is not more than        an upper limit Max (n,T,V) determined from n,T,V, assuming that        V is a given integer not less than 0,    -   where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between        subsets, the Hamming distance being defined by        d_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Supplementary Note 2

The information processing apparatus according to Supplementary note 1,wherein

-   -   after deriving a subset X_(t) in a round t, the subset sequence        derivation means is capable of referring to a value f_(t)(X) of        the objective function f_(t) with respect to any subset X∈2^(S),        and    -   the upper limit Max (n,T,V) is given by the following expression        (a):

Max(n,T,V)=4√{square root over (T(n+2V))}+√{square root over(32Tlog(┌logT┐+4))}  (a)

Supplementary Note 3

The information processing apparatus according to Supplementary note 2,wherein

-   -   the subset sequence derivation means uses (i) a d-dimensional        vector p_(t)∈[0,1]^(d) (d is a maximum natural number not        exceeding logT+4) satisfying |p_(t)|=1 and (ii) d n-dimensional        vectors x_(t) ⁽¹⁾, x_(t) ⁽²⁾, . . . , x_(t) ^((d))∈R^(n) to        carry out, in each round,    -   a subset derivation step of using a randomly selected        u_(t)∈[0,1] to derive the subset X_(t)={i∈[n]|x_(ti)≥u_(t)},        assuming that p_(tj) is a jth component of the vector p_(t),        x_(t) is an n-dimensional vector defined by        x_(t)=Σ_(j∈)p_(tj)x_(t) ^((j)), and x_(ti) is an ith component        of the vector x_(t),    -   a subgradient derivation step of deriving a subgradient g_(t) at        x_(t) of the objective function f_(t), and    -   a vector update step of updating the vectors x_(t) ⁽¹⁾, x_(t)        ⁽²⁾, . . . , x_(t) ^((d)) in accordance with the following        expression (a1) and updating the vector p_(t) in accordance with        the following expression (a2):

$\begin{matrix}{{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}g_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & ({a1})\end{matrix}$

-   -   where η(j) is a constant determined in accordance with n,

$\begin{matrix}{{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{g_{\tau}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & ({a2})\end{matrix}$

-   -   where η is a constant determined in accordance with d and T.

Supplementary Note 4

The information processing apparatus according to Supplementary note 1,wherein

-   -   after selecting a subset X_(t) in a round t, the subset sequence        derivation means is (1) capable of referring to a value        f_(t)(X_(t)) of the objective function f_(t) with respect to the        selected subset X_(t) and (2) incapable of referring to a value        f_(t)(X) of the objective function f_(t) with respect to a        subset X∈2^(S) that is different from the selected subset, and    -   the upper limit Max (n,T,V) is given by the following expression        (b):

$\begin{matrix}{{{Max}\left( {n,T,V} \right)} = {{\gamma T} + {4\left( {n + 1} \right)\sqrt{\frac{T}{\gamma}}\left( {{2\sqrt{\log\log T}} + \sqrt{n + V}} \right)}}} & (b)\end{matrix}$

-   -   where γ is a predetermined constant not less than 0 and not more        than 1.

Supplementary Note 5

The information processing apparatus according to Supplementary note 4,wherein

-   -   the subset sequence derivation means uses (i) a d-dimensional        vector p_(t)∈[0,1]^(d) (d is a maximum natural number not        exceeding 4logT) satisfying |p_(t)|=1 and (ii) d n-dimensional        vectors x_(t) ⁽¹⁾, x_(t) ⁽²⁾, . . . , x_(t) ^((d))∈R^(n) to        carry out, in each round,    -   a subset derivation step of (1) using a randomly selected        u_(t)∈[0,1] to derive the subset X_(t)={i∈[n]|x_(ti)≥u_(t)}        or (2) using a permutation σ on a set [n] satisfying        x_(tσ(1))≥x_(tσ(2))≥. . . ≥x_(tσ(n)) and a randomly selected        s_(t){0,1, . . . , n} to derive the subset        X_(t)={σ(j)|j∈[s_(t)]}, assuming that p_(tj) is a jth component        of the vector p_(t), x_(t) is an n-dimensional vector defined by        x_(t)=Σ_(j∈)p_(tj)x_(t) ^((j)), and x_(ti) is an ith component        of a vector tx, the subset X_(t)={i∈[n]|x_(ti)≥u_(t)} being        derived with a probability of 1−γ, the subset        X_(t)={σ(j)|j∈[s_(t)]} being derived with a probability of γ,    -   an unbiased estimator derivation step of deriving an unbiased        estimator {circumflex over ( )}g_(t) ({circumflex over ( )}g is        a symbol with ∧ above g) of a subgradient g_(t) at x_(t) of the        objective function f_(t), and    -   a vector update step of updating the vectors x_(t) ⁽¹⁾, x_(t)        ⁽²⁾, . . . , x_(t) ^((d)) in accordance with the following        expression (b1) and updating the vector p_(t) in accordance with        the following expression (b2):

$\begin{matrix}{{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}{\hat{g}}_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & \left( {b1} \right)\end{matrix}$

-   -   where η^((j)) is a constant determined in accordance with n,

$\begin{matrix}{{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{{\hat{g}}_{t}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & ({b2})\end{matrix}$

-   -   where η is a constant determined in accordance with n, d, and T.

Supplementary Note 6

An information processing apparatus including:

-   -   an objective function setting means that sets, as an objective        function f_(t) in each round t∈[T] (T is any natural number), a        submodular function on a power set 2^(S) of a set S consisting        of n elements (n is any natural number); and    -   a subset sequence derivation means that derives a subset        sequence X₁, X₂, . . . , X_(T)∈2^(S),    -   the subset sequence derivation means using (i) a d-dimensional        vector p_(t)∈[0,1]^(d) (d is a maximum natural number not        exceeding logT+4) satisfying |p_(t)|=1 and (ii) d n-dimensional        vectors x_(t) ⁽¹⁾, x_(t) ⁽²⁾, . . . , x_(t) ^((d))∈R^(n) to        carry out, in each round,    -   a subset derivation step of using a randomly selected        u_(t)∈[0,1] to derive a subset X_(t)={i∈[n]|x_(ti)≥u_(t)},        assuming that p_(tj) is a jth component of the vector p_(t),        x_(t) is an n-dimensional vector defined by        x_(t)=Σ_(j∈)p_(tj)x_(t) ^((j)), and x_(ti) is an ith component        of the vector x_(t),    -   a subgradient derivation step of deriving a subgradient g_(t) at        x_(t) of the objective function f_(t), and    -   a vector update step of updating the vectors x_(t) ⁽¹⁾, x_(t)        ⁽²⁾, . . . , x_(t) ^((d)) in accordance with the following        expression (a1) and updating the vector p_(t) in accordance with        the following expression (a2):

$\begin{matrix}{{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}g_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & ({a1})\end{matrix}$

-   -   where η^((j)) is a constant determined in accordance with n,

$\begin{matrix}{{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{g_{\tau}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & ({a2})\end{matrix}$

-   -   where η is a constant determined in accordance with d and T.

Supplementary Note 7

An information processing apparatus including:

-   -   an objective function setting means that sets, as an objective        function f_(t) in each round t∈[T] (T is any natural number), a        submodular function on a power set 2^(S) of a set S consisting        of n elements (n is any natural number); and    -   a subset sequence derivation means that derives a subset        sequence X₁, X₂, . . . , X_(T)∈2^(S),    -   the subset sequence derivation means using (i) a d-dimensional        vector p_(t)∈[0,1]^(d) (d is a maximum natural number not        exceeding 4logT) satisfying |p_(t)|=1 and (ii) d n-dimensional        vectors x_(t) ⁽¹⁾, x_(t) ⁽²⁾, . . . , x_(t) ^((d))∈R^(n) to        carry out, in each round,    -   a subset derivation step of (1) using a randomly selected        u_(t)∈[0,1] to derive a subset X_(t)={i∈[n]|x_(ti)≥u_(t)} or (2)        using a permutation o on a set [n] satisfying        x_(tσ(1))≥x_(tσ(2))x_(tσ(n)) and a randomly selected s_(t)∈{0,1,        . . . ,n} to derive the subset X_(t)={σ(j)|j∈[s_(t)]}, assuming        that p_(tj) is a jth component of the vector p_(t), x_(t) is an        n-dimensional vector defined by x_(t)=Σ_(j∈)p_(tj)x_(t) ^((j)),        and x_(ti) is an ith component of a vector tx, the subset        X_(t)={i∈[n]|x_(ti)≥u_(t)} being derived with a probability of        1−γ, the subset X_(t)={σ(j)|j∈[s_(t)]} being derived with a        probability of γ,    -   an unbiased estimator derivation step of deriving an unbiased        estimator {circumflex over ( )}g_(t) ({circumflex over ( )}g is        a symbol with ∧ above g) of a subgradient g_(t) at x_(t) of the        objective function f_(t), and    -   a vector update step of updating the vectors x_(t) ⁽¹⁾, x_(t)        ⁽²⁾, . . . , x_(t) ^((d)) in accordance with the following        expression (b1) and updating the vector p_(t) in accordance with        the following expression (b2):

$\begin{matrix}{{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}{\hat{g}}_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & ({b1})\end{matrix}$

-   -   where η^((j)) is a constant determined in accordance with n,

$\begin{matrix}{{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{{\hat{g}}_{\tau}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & ({b2})\end{matrix}$

-   -   where η is a constant determined in accordance with n, d, and T.

Supplementary Note 8

An information processing apparatus including:

-   -   an objective function setting means that sets, as an objective        function f_(t) in each round t∈[T] (T is any natural number), a        normalized submodular function on a power set 2^(S) of a set S        consisting of n elements (n is any natural number); and    -   a subset sequence derivation means that derives a subset        sequence X₁, X₂, . . . , X_(T) satisfying the following        condition β1 or β2:    -   the condition β1 being that each subset X_(t) satisfies        |X_(t)|≤k assuming that k is a given natural number and that an        asymptotic behavior of an expected value of α regret        αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to        any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying        |X_(t)*|≤k and Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides        with an asymptotic behavior of a function A (k,T,V) determined        from k,T,V, assuming that V is a given integer not less than 0,    -   the condition β2 being that the asymptotic behavior of the        expected value of the a regret        αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to        the any benchmark X₁*,X₂*, . . . ,X_(t)*∈2^(S) satisfying        Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with an asymptotic        behavior of a function B (n,T,V) determined from n,T,V, assuming        that V is a given integer not less than 0,    -   where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between        subsets, the Hamming distance being defined by        d_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Supplementary Note 9

The information processing apparatus according to Supplementary note 8,wherein

-   -   after deriving a subset X_(t) in a round t, the subset sequence        derivation means is capable of referring to a value f_(t)(X) of        the objective function f_(t) with respect to any subset X∈2^(S),        and    -   the function A (k,T,V) is given by the following expression (c):

A(k,T,V)=√{square root over (kT(k+V))}  (c)

Supplementary Note 10

The information processing apparatus according to Supplementary note 9,wherein

-   -   the subset sequence derivation means carries out, in the each        round t,    -   a subset derivation step of deriving the subset X_(t)=X_(tk) by        setting X_(t0) to X_(t0)=Ø and then repeatedly carrying out a        process for using an element i ts derived from a vector p_(t)        ^((s)) output by an FSF algorithm execution module FSF*^((s)) to        generate X_(ts)=X_(t,s−1)∪{i_(ts)}, and    -   a feed generation step of generating, in accordance with l_(ti)        ^((s))=f_(t)(X_(t,s−1)∪{i})−f_(t)(X_(t,s−1))(i∈[n]) , a feed        l_(t) ^((s))=(l_(t1) ^((s)),l_(t2) ^((s)), . . . ,l_(tn) ^((s)))        to be input to the FSF algorithm execution module FSF*^((s)).

Supplementary Note 11

The information processing apparatus according to Supplementary note 8,wherein

-   -   after deriving a subset X_(t) in a round t, the subset sequence        derivation means is capable of referring to a value f_(t)(X) of        the objective function f_(t) with respect to any subset X∈2S,        and    -   the function B (n,T,V) is given by the following expression (d):

B(n,T,V)=√{square root over (T(1+V/n))}  (d)

Supplementary Note 12

The information processing apparatus according to Supplementary note 11,wherein

-   -   the subset sequence derivation means carries out, in the each        round t,    -   a subset derivation step of deriving the subset        X_(t)=X_(tn)=Y_(tn) by repeatedly carrying out a process for (1)        setting X_(t0) to X_(t0)=Ø and setting Y_(t0) to Y_(t0)=[n], (2)        using a vector p_(t) ^((s)) output by an FSF algorithm execution        module FSF*^((s)) to set q_(t) ^((s)) to q_(t) ^((s))=(1+2p_(t1)        ^((s)))/4, and (3a) with a probability q_(t) ^((s)), setting        X_(ts) to X_(ts)=X_(t,s−1)∪{s} and setting Y_(ts) to        Y_(ts)=Y_(t,s−1) or (3b) with a probability 1−q_(t) ^((s)),        setting X_(ts) to X_(ts)=X_(t,s−1) and setting Y_(ts) to        Y_(ts)=Y_(t,s−1)\{s}, and    -   a feed generation step of setting α_(ts) to        α_(ts)=f_(t)(X_(t,s−1)∪{s})−f_(t)(X_(t,s−1)) and setting β_(ts)        to α_(ts)=f_(t)(Y_(t,s−1)\{s})−f_(t)(Y_(t,s−1)), and then        generating, in accordance with l_(ti) ^((s))=(1−q_(t)        ^((s)))α_(ts) and l_(t2) ^((s))=q_(t) ^((s))β_(ts), a feed l_(t)        ^((s))=(l_(t1) ^((s)),l_(t2) ^((s))) to be input to the FSF        algorithm execution module FSF*^((s)).

Supplementary Note 13

An information processing apparatus including:

-   -   an objective function setting means that sets, as an objective        function f_(t) in each round t∈[T] (T is any natural number), a        normalized submodular function on a power set 2^(S) of a set S        consisting of n elements (n is any natural number); and    -   a subset sequence derivation means that derives a subset        sequence X₁,X₂, . . . ,X_(T)∈2^(S),    -   the subset sequence derivation means carrying out, in the each        round t,    -   a subset derivation step of deriving a subset X_(t)=X_(tk) by        setting X_(t0) to X_(t0)=Ø and then repeatedly carrying out a        process for using an element i_(ts) derived from a vector p_(t)        ^((s)) output by an FSF algorithm execution module FSF*^((s)) to        generate X_(ts)=X_(t,s−1)∪{i_(ts)}, and    -   a feed generation step of generating, in accordance with l_(ti)        ^((s))=f_(t)(X_(t,s−1)∪{i})−f_(t)(X_(t,s−1))(i∈[n]) , a feed        l_(t) ^((s))=(l_(t1) ^((s)),l_(t2) ^((s)), . . . ,l_(tn) ^((s)))        to be input to the FSF algorithm execution module FSF*^((s)).

Supplementary Note 14

An information processing apparatus including:

-   -   an objective function setting means that sets, as an objective        function f_(t) in each round t∈[T] (T is any natural number), a        normalized submodular function on a power set 2^(S) of a set S        consisting of n elements (n is any natural number); and    -   a subset sequence derivation means that derives a subset        sequence X₁, X₂, . . . , X_(T)∈2^(S),    -   the subset sequence derivation means carrying out, in the each        round t,    -   a subset derivation step of deriving a subset        X_(t)=X_(tn)=Y_(tn) by repeatedly carrying out a process for (1)        setting X_(t0) to X_(t0)=Ø and setting Y_(t0) to Y_(t0)=[n], (2)        using a vector p_(t) ^((s)) output by an FSF algorithm execution        module FSF*^((s)) to set q_(t) ^((s)) to q_(t) ^((s))=(1+2p_(t1)        ^((s)))/4, and (3a) with a probability q_(t) ^((s)), setting        X_(ts) to X_(ts)=X_(t,s−1)∪{s} and setting Y_(ts) to        Y_(ts)=Y_(t,s−1) or (3b) with a probability 1−q_(t) ^((s)),        setting X_(ts) to X_(ts)=X_(t,s−1) and setting Y_(ts) to        Y_(ts)=Y_(t,s−1)\{s}, and    -   a feed generation step of setting α_(ts) to        α_(ts)=f_(t)(X_(t,s−1)∪{s})−f_(t)(X_(t,s−1)) and setting β_(ts)        to α_(ts)=f_(t)(Y_(t,s−1)\{s})−f_(t)(Y_(t,s−1)), and then        generating, in accordance with l_(ti) ^((s))=(1−q_(t)        ^((s)))α_(ts) and l_(t2) ^((s))=q_(t) ^((s))β_(ts), a feed l_(t)        ^((s))=(l_(t1) ^((s)),l_(t2) ^((s))) to be input to the FSF        algorithm execution module FSF*^((s)).

Supplementary Note 15

An information processing method including: setting, as an objectivefunction f_(t) in each round t∈[T] (T is any natural number), asubmodular function on a power set 2^(S) of a set S consisting of nelements (n is any natural number); and

-   -   deriving a subset sequence X₁, X₂, . . . , X_(T)∈2^(S) in which        an expected value of regret        Σ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with respect to any        benchmark X₁* ,X₂*, . . . , X_(t)*∈2^(S) satisfying        Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*) is not more than an upper        limit Max (n,T,V) determined from n,T,V, assuming that V is a        given integer not less than 0,    -   where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between        subsets, the Hamming distance being defined by        d_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Supplementary Note 16

An information processing method including: setting, as an objectivefunction f_(t) in each round t∈[T] (T is any natural number), anormalized submodular function on a power set 2^(S) of a set Sconsisting of n elements (n is any natural number); and

-   -   deriving a subset sequence X₁, X₂, . . . , X_(T) satisfying the        following condition β1 or β2:    -   the condition β1 being that each subset X_(t) satisfies        |X_(t)|≤k assuming that k is a given natural number and that an        asymptotic behavior of an expected value of α regret        αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to        any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying        |X_(t)*|≤k and Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t−1)*)≤V coincides        with an asymptotic behavior of a function A (k,T,V) determined        from k,T,V, assuming that V is a given integer not less than 0,    -   the condition β2 being that the asymptotic behavior of the        expected value of the α regret        αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to        the any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying        Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with an asymptotic        behavior of a function B (n,T,V) determined from n,T,V, assuming        that V is a given integer not less than 0,    -   where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between        subsets, the Hamming distance being defined by        d_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)* |.

Supplementary Note 17

A program for causing a computer to operate as an information processingapparatus,

-   -   the program causing the computer to function as: an objective        function setting means that sets, as an objective function f_(t)        in each round t∈[T] (T is any natural number), a submodular        function on a power set 2^(S) of a set S consisting of n        elements (n is any natural number); and a subset sequence        derivation means that derives a subset sequence X₁, X₂, . . . ,        X_(T)∈2^(S) in which an expected value of regret        Σ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with respect to any        benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying        Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*) is not more than an upper        limit Max (n,T,V) determined from n,T,V, assuming that V is a        given integer not less than 0,    -   where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between        subsets, the Hamming distance being defined by        d_(H)(X_(t)*,X_(t+1)*)−|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Supplementary Note 18

A computer-readable storage medium storing the program according toSupplementary note 17.

Supplementary Note 19

A program for causing a computer to operate as an information processingapparatus,

-   -   the program causing the computer to function as: an objective        function setting means that sets, as an objective function f_(t)        in each round t∈[T] (T is any natural number), a normalized        submodular function on a power set 2^(S) of a set S consisting        of n elements (n is any natural number); and a subset sequence        derivation means that derives a subset sequence X₁, X₂, . . . ,        X_(T) satisfying the following condition β1 or β2:    -   the condition β1 being that each subset X_(t) satisfies        |X_(t)|≤k assuming that k is a given natural number and that an        asymptotic behavior of an expected value of α regret        αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to        any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying        |X_(t)*|≤k and Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides        with an asymptotic behavior of a function A (k,T,V) determined        from k,T,V, assuming that V is a given integer not less than 0,    -   the condition β2 being that the asymptotic behavior of the        expected value of the α regret        αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to        the any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying        Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with an asymptotic        behavior of a function B (n,T,V) determined from n,T,V, assuming        that V is a given integer not less than 0,    -   where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between        subsets, the Hamming distance being defined by        d_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Supplementary Note 20

A computer-readable storage medium storing the program according toSupplementary note 19.

Supplementary Note 21

An information processing apparatus including at least one processor,the at least one processor carrying out: an objective function settingprocess for setting, as an objective function f_(t) in each round t∈[T](T is any natural number), a submodular function on a power set 2^(S) ofa set S consisting of n elements (n is any natural number); and a subsetsequence derivation process for deriving a subset sequence X₁, X₂, . . ., X_(T)∈2^(S) in which an expected value of regretΣ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with respect to anybenchmark X₁*, X₂*, . . . , X_(t)*∈2^(Ss) satisfyingΣ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V is not more than an upper limit Max(n,T,V) determined from n,T,V, assuming that V is a given integer notless than 0, where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance betweensubsets, the Hamming distance being defined byd_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Supplementary Note 22

An information processing apparatus including at least one processor,the at least one processor carrying out: an objective function settingprocess for setting, as an objective function f_(t) in each round t∈[T](T is any natural number), a normalized submodular function on a powerset 2^(S) of a set S consisting of n elements (n is any natural number);and a subset sequence derivation process for deriving a subset sequenceX₁, X₂, . . . , X_(T) satisfying the following condition β1 or β2:

-   -   the condition β1 being that each subset X_(t) satisfies        |X_(t)|≤k assuming that k is a given natural number and that an        asymptotic behavior of an expected value of α regret        αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to        any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying        |X_(t)*|≤k and Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides        with an asymptotic behavior of a function A (k,T,V) determined        from k,T,V, assuming that V is a given integer not less than 0,    -   the condition β2 being that the asymptotic behavior of the        expected value of the α regret        αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to        the any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying        Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with an asymptotic        behavior of a function B (n,T,V) determined from n,T,V, assuming        that V is a given integer not less than 0,    -   where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between        subsets, the Hamming distance being defined by        d_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.

Supplementary Note 23

Note that any of these information processing apparatuses may furtherinclude a memory, which may store a program for causing the at least oneprocessor to carry out the objective function setting process and thesubset sequence derivation process. Not also that the program may berecorded in a non-transitory, tangible computer-readable storage medium.

REFERENCE SIGNS LIST

-   -   1, 2 Information processing apparatus    -   11, 21 Objective function setting unit (objective function        setting means)    -   12, 22 Subset sequence derivation unit (subset sequence        derivation means)    -   S1, S2 Information processing method    -   S11, S21 Objective function setting process    -   S12, S22 Subset sequence derivation unit

What is claimed is:
 1. An information processing apparatus comprising at least one processor, the at least one processor carrying out: an objective function setting process for setting, as an objective function f_(t) in each round t∈[T] (T is any natural number), a submodular function on a power set 2^(S) of a set S consisting of n elements (n is any natural number); and a subset sequence derivation process for deriving a subset sequence X₁,X₂, . . . ,X_(T)∈2^(S) in which an expected value of regret Σ_(t∈[T])f_(t)(X_(t))−Σ_(t∈[T])f_(t)(X_(t)*) with respect to any benchmark X₁*,X₂*, . . . ,X_(t)*∈2^(S) satisfying Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V is not more than an upper limit Max (n,T,V) determined from n,T,V, assuming that V is a given integer not less than 0, where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between subsets, the Hamming distance being defined by d_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.
 2. The information processing apparatus according to claim 1, wherein in the subset sequence derivation process, after deriving a subset X_(t) in a round t, the at least one processor is capable of referring to a value f_(t)(X) of the objective function f_(t) with respect to any subset X∈2^(S), and the upper limit Max (n,T,V) is given by the following expression (a): Max(n,T,V)=4√{square root over (T(n+2V))}+√{square root over (32Tlog(┌logT┐+4))}  (a) 3.The information processing apparatus according to claim 2, wherein in the subset sequence derivation process, the at least one processor uses (i) a d-dimensional vector p_(t)∈[0,1]^(d) (d is a maximum natural number not exceeding logT+4) satisfying |p_(t)|=1 and (ii) d n-dimensional vectors X_(t) ⁽¹⁾,X_(t) ⁽²⁾, . . . ,X_(t) ^((d))∈R^(n) to carry out, in each round, a subset derivation step of using a randomly selected u_(t)∈[0,1] to derive the subset X_(t)={i∈[n]|x_(ti)≥u_(t)}, assuming that p_(tj) is a jth component of the vector p_(t), x_(t) is an n-dimensional vector defined by x_(t)=Σ_(j∈[d])p_(tj)x_(t) ^((j)), and x_(ti) is an ith component of the vector x_(t), a subgradient derivation step of deriving a subgradient g_(t) at x_(t) of the objective function f_(t), and a vector update step of updating the vectors x_(t) ⁽¹⁾,x_(t) ⁽²⁾, . . . ,x_(t) ^((d)) in accordance with the following expression (a1) and updating the vector p_(t) in accordance with the following expression (a2): $\begin{matrix} {{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}g_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & ({a1}) \end{matrix}$ where η^((j)) is a constant determined in accordance with n, $\begin{matrix} {{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{g_{\tau}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & ({a2}) \end{matrix}$ where η is a constant determined in accordance with d and T.
 4. The information processing apparatus according to claim 1, wherein in the subset sequence derivation process, after selecting a subset X_(t) in a round t, the at least one processor is (1) capable of referring to a value f_(t)(X_(t)) of the objective function f_(t) with respect to the selected subset X_(t) and (2) incapable of referring to a value f_(t)(X) of the objective function f_(t) with respect to a subset X∈2^(S) that is different from the selected subset, and the upper limit Max (n,T,V) is given by the following expression (b): $\begin{matrix} {{{Max}\left( {n,T,V} \right)} = {{\gamma T} + {4\left( {n + 1} \right)\sqrt{\frac{T}{\gamma}}\left( {{2\sqrt{\log\log T}} + \sqrt{n + V}} \right)}}} & (b) \end{matrix}$ where γ is a predetermined constant not less than 0 and not more than
 1. 5. The information processing apparatus according to claim 4, wherein in the subset sequence derivation process, the at least one processor uses (i) a d-dimensional vector p_(t)∈[0,1]^(d) (d is a maximum natural number not exceeding 4logT) satisfying |p_(t)|=1 and (ii) d n-dimensional vectors x_(t) ⁽¹⁾,x_(t) ⁽²⁾, . . . ,x_(t) ^((d))∈R^(n) to carry out, in each round, a subset derivation step of (1) using a randomly selected u_(t)∈[0,1] to derive the subset X_(t)={i∈[n]|x_(ti)≥u_(t)} or (2) using a permutation σ on a set [n] satisfying x_(tσ(1))≥x_(tσ(2))≥. . . ≥x_(tσ(n)) and a randomly selected s_(t)∈{0,1, . . . ,n} to derive the subset X_(t)={σ(j)|j∈[s_(t)]}, assuming that p_(tj) is a jth component of the vector p_(t), x_(t) is an n-dimensional vector defined by x_(t)=Σ_(j∈[d])p_(tj)x_(t) ^((j)), and x_(ti) is an ith component of the vector x_(t), the subset X_(t)={i∈[n]|x_(ti)≥u_(t)} being derived with a probability of 1−γ, the subset X_(t)={σ(j)|j∈[s_(t)]} being derived with a probability of γ, an unbiased estimator derivation step of deriving an unbiased estimator {circumflex over ( )}g_(t) ({circumflex over ( )}g is a symbol with ∧ above g) of a subgradient g_(t) at x_(t) of the objective function f_(t), and a vector update step of updating the vectors x_(t) ⁽¹⁾,x_(t) ⁽²⁾, . . . ,x_(t) ^((d)) in accordance with the following expression (b1) and updating the vector p_(t) in accordance with the following expression (b2): $\begin{matrix} {{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}{\hat{g}}_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & ({b1}) \end{matrix}$ where η^((j)) is a constant determined in accordance with n, $\begin{matrix} {{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{{\hat{g}}_{t}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & ({b2}) \end{matrix}$ where η is a constant determined in accordance with n, d, and T.
 6. An information processing apparatus comprising at least one processor, the at least one processor carrying out: an objective function setting process for setting, as an objective function f_(t) in each round t∈[T] (T is any natural number), a submodular function on a power set 2^(S) of a set S consisting of n elements (n is any natural number); and a subset sequence derivation process for deriving a subset sequence X₁,X₂, . . . ,X_(T)∈2^(S), in the subset sequence derivation process, the at least one processor using (i) a d-dimensional vector p_(t)∈[0,1]^(d) (d is a maximum natural number not exceeding logT+4) satisfying |p_(t)|=1 and (ii) d n-dimensional vectors x_(t) ⁽¹⁾, x_(t) ⁽²⁾, . . . , x_(t) ^((d))∈R^(n) to carry out, in each round, a subset derivation step of using a randomly selected u_(t)∈[0,1] to derive a subset X_(t)={i∈[n]|x_(ti)≥u_(t)}, assuming that p_(tj) is a jth component of the vector p_(t), x_(t) is an n-dimensional vector defined by x_(t)=Σ_(j∈[d])p_(tj)x_(t) ^((j)), and x_(ti) is an ith component of the vector x_(t), a subgradient derivation step of deriving a subgradient g_(t) at x_(t) of the objective function f_(t), and a vector update step of updating the vectors x_(t) ⁽¹⁾, x_(t) ⁽²⁾, . . . , x_(t) ^((d)) in accordance with the following expression (a1) and updating the vector p_(t) in accordance with the following expression (a2): $\begin{matrix} {{y_{t + 1}^{(j)} = {x_{t}^{(j)} - {\eta^{(j)}g_{t}}}},{x_{t}^{(j)} \in {\arg\min\limits_{x \in {\lbrack{0,1}\rbrack}^{n}}{{x - y_{t + 1}^{(j)}}}_{2}^{2}}}} & ({a1}) \end{matrix}$ where η^((j)) is a constant determined in accordance with n, $\begin{matrix} {{w_{tj} = {{\exp\left( {{- \eta}{\sum\limits_{\tau \in {\lbrack t\rbrack}}{g_{\tau}^{T}x_{\tau}^{(j)}}}} \right)}\left( {j \in \lbrack d\rbrack} \right)}},{p_{t + 1} = \frac{w_{t}}{{w_{t}}_{1}}}} & ({a2}) \end{matrix}$ where η is a constant determined in accordance with d and T.
 7. (canceled)
 8. An information processing apparatus comprising at least one processor, the at least one processor carrying out: an objective function setting process for setting, as an objective function f_(t) in each round t∈[T] (T is any natural number), a normalized submodular function on a power set 2^(S) of a set S consisting of n elements (n is any natural number); and a subset sequence derivation process for deriving a subset sequence X₁,X₂, . . . ,X_(T) satisfying the following condition β1 or β2: the condition β1 being that each subset X_(t) satisfies |X_(t)|≤k assuming that k is a given natural number and that an asymptotic behavior of an expected value of α regret αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying |X_(t)*|≤k and Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with an asymptotic behavior of a function A (k,T,V) determined from k,T,V, assuming that V is a given integer not less than 0, the condition β2 being that the asymptotic behavior of the expected value of the α regret αΣ_(t∈[T])f_(t)(X_(t)*)−Σ_(t∈[T])f_(t)(X_(t)) with respect to the any benchmark X₁*, X₂*, . . . , X_(t)*∈2^(S) satisfying Σ_(t∈[T−1])d_(H)(X_(t)*,X_(t+1)*)≤V coincides with an asymptotic behavior of a function B (n,T,V) determined from n,T,V, assuming that V is a given integer not less than 0, where d_(H)(X_(t)*,X_(t+1)*) is a Hamming distance between subsets, the Hamming distance being defined by d_(H)(X_(t)*,X_(t+1)*)=|X_(t)*∪X_(t+1)*|−|X_(t)*∩X_(t+1)*|.
 9. The information processing apparatus according to claim 8, wherein in the subset sequence derivation process, after deriving a subset X_(t) in a round t, the at least one processor is capable of referring to a value f_(t)(X) of the objective function f_(t) with respect to any subset X∈2^(S), and the function A (k,T,V) is given by the following expression (c): A(k,T,V)=√{square root over (kT(k+V))}  (c)
 10. The information processing apparatus according to claim 9, wherein in the subset sequence derivation process, the at least one processor carries out, in the each round t, a subset derivation step of deriving the subset X_(t)=X_(tk) by setting X_(t0) to X_(t0)=Ø and then repeatedly carrying out a process for using an element i_(ts) derived from a vector p_(t) ^((s)) output by an FSF algorithm execution module FSF*^((s)) to generate X_(ts)=X_(t,s−1)∪{i_(ts)}, and a feed generation step of generating, in accordance with l_(ti) ^((s))=f_(t)(X_(t,s−1)∪{i})−f_(t)(X_(t,s−1))(i∈[n]), a feed l_(t) ^((s))=(l_(t1) ^((s)),l_(t2) ^((s)), . . . ,l_(tn) ^((s)) to be input to the FSF algorithm execution module FSF*^((s)).
 11. The information processing apparatus according to claim 8, wherein in the subset sequence derivation process, after deriving a subset X_(t) in a round t, the at least one processor is capable of referring to a value f_(t)(X) of the objective function f_(t) with respect to any subset X∈2^(S), and the function B (n,T,V) is given by the following expression (d): B(n,T,V)=√{square root over (T(1+V/n))}  (d)
 12. The information processing apparatus according to claim 11, wherein in the subset sequence derivation process, the at least one processor carries out, in the each round t, a subset derivation step of deriving the subset X_(t)=X_(tn)=Y_(tn) by repeatedly carrying out a process for (1) setting X_(t0) to X_(t0)=Ø and setting Y_(t0) to Y_(t0)=[n], (2) using a vector p_(t) ^((s)) output by an FSF algorithm execution module FSF*^((s)) to set q_(t) ^((s)) to q_(t) ^((s))=(1+2p_(t1) ^((s)))/4, and (3a) with a probability q_(t) ^((s)), setting X_(ts) to X_(ts)=X_(t,s−1)∪{s} and setting Y_(ts) to Y_(ts)=Y_(t,s−1) or (3b) with a probability 1−q_(t) ^((s)), setting X_(ts) to X_(ts)=X_(t,s−1) and setting Y_(ts) to Y_(ts)=Y_(t,s−1)\{s}, and a feed generation step of setting α_(ts) to α_(ts)=f_(t)(X_(t,s−1)∪{s})−f_(t)(X_(t,s−1)) and setting β_(ts) to β_(ts)=f_(t)(Y_(t,s−1)\{s})−f_(t)(Y_(t,s−1)), and then generating, in accordance with l_(ti) ^((s))=(1−q_(t) ^((s)))α_(ts) and l_(t2) ^((s))=q_(t) ^((s))β_(ts), a feed l_(t) ^((s))=(l_(t1) ^((s)),l_(t2) ^((s))) to be input to the FSF algorithm execution module FSF*^((s)). 13-14. (canceled) 